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Abstract 

Performance of object-oriented database sys- 
tems (OODBs) is still an issue to both design- 
ers and users nowadays. The aim of this pa- 
per is to propose a generic discrete-event ran- 
dom simulation model, called VOODB, in or- 
der to evaluate the performances of OODBs in 
general, and the performances of optimization 
methods like clustering in particular. Such op- 
timization methods undoubtedly improve the 
performances of OODBs. Yet, they also al- 
ways induce some kind of overhead for the 
system. Therefore, it is important to eval- 
uate their exact impact on the overall per- 
formances. VOODB has been designed as 
a generic discrete-event random simulation 
model by putting to use a modelling approach, 
and has been validated by simulating the be- 
havior of the 2 OODB and the Texas persis- 
tent object store. Since our final objective is 
to compare object clustering algorithms, some 
experiments have also been conducted on the 
DSTC clustering technique, which is imple- 
mented in Texas. To validate VOODB, per- 
formance results obtained by simulation for a 
given experiment have been compared to the 
results obtained by benchmarking the real sys- 
tems in the same conditions. Benchmarking 
and simulation performance evaluations have 
been observed to be consistent, so it appears 
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that simulation can be a reliable approach to 
evaluate the performances of OODBs. 

Keywords: Object-oriented database systems, 
Object clustering, Performance evaluation, 
Discrete-event random simulation. 

1 Introduction 

The needs in terms of performance evaluation 
for Object-Oriented Database Management Systems 
(OODBMSs) remain strong for both designers and 
users. Furthermore, it appears a necessity to perform 
a priori evaluations (before a system is actually built 
or achieved) in a variety of situations. A system de- 
signer may need to a priori test the efficiency of an 
optimization procedure or adjust the parameters of a 
buffering technique. It is also very helpful to users to 
a priori estimate whether a given system is able to 
handle a given workload. 

The challenge of comparing object clustering tech- 
niques motivated us to contribute to OODBMSs per- 
formance evaluation. The principle of clustering is to 
store related objects close together on secondary stor- 
age. Hence, when one of these objects is loaded into 
the main memory, all its related objects are also loaded 
at the same time. Subsequent accesses to these objects 
are thus main memory accesses that are much faster 
than disk I/Os. However, clustering induces an over- 
head for the system (e.g., to reorganize the database, 
to collect and maintain usage statistics...), so it is im- 
portant to gauge its true impact on the overall perfor- 
mances. For this particular problem, a priori evalua- 
tion is very attractive since it avoids coding inefficient 
algorithms in existing systems. 

Discrete-event random simulation constitutes a tra- 
ditional approach to a priori performance evaluation. 
Numerous simulation languages and/or environments 
exist nowadays. They allow the simulation of vari- 
ous classes of systems (computer systems, networks, 



production systems...). However, the use of simula- 
tion is not as widely disseminated as it could be in 
the database domain. The main difficulty is to elab- 
orate a "good" functioning model for a system. Such 
a good model must be representative of the perfor- 
mances to evaluate, with the requested precision de- 
gree. For this sake, finding out the significant charac- 
teristics of a system and translating them into entities 
in the chosen simulation language often remains a spe- 
cialist issue. Hence, users must call on consulting or 
specialized firms, which stretches out study times and 
costs. 

In the field of OODBs, discrete-event random sim- 
ulation has been chiefly used to validate proposals 
concerning optimization techniques, especially object 
clustering techniques. For instance, a dedicated model 
in PAWS was proposed in [Cha89j to validate a cluster- 
ing and a buffering strategy in a CAD context. The 
objective was to find out how different optimization 
algorithms influence performances when the charac- 
teristics of the application accessing data vary, and 
which relationship exists between object clustering and 
parameters such as read/write ratio. Discrete-event 
random simulation was also used by |Dar96[ |Gay97| 
in order to compare the efficiency of different cluster- 
ing strategies for OODBs. The proposed models were 
coded in SLAM II. 

Some other studies use simulation approaches that 
are not discrete-event random simulation approaches, 
but are nevertheless interesting. |Che91j conducted 
simulation to show the effectiveness of different clus- 
tering schemes when parameters such as read/write 
ratio vary. The authors particularly focused on disk 
drive modelling. The CLAB (CLustering LAboratory) 
software [Tsa92] was designed to compare graph par- 
titioning algorithms applied to object clustering. It 
is constituted of a set of Unix tools programmed in 
CH — h, which can be assembled in various configura- 
tions. Yet other studies from the fields of distributed 
or parallel databases prove helpful, e.g., the modelling 
methodologies from |Iae95] or the workload models 
from [HeMlBit95j . 

These different studies bring forth the following ob- 
servations. 

First, most proposed simulation models are dedi- 
cated: they have been designed to evaluate the perfor- 
mance of a given optimization method. Furthermore, 
they only exploit one type of OODBMS, while vari- 
ous architectures influencing performances are possi- 
ble (object server, page server, etc.). We advocate a 
more generic approach that would help modelling the 
behavior of various systems, implanting various object 
bases into these systems, and executing various trans- 
actions on these databases. 

Besides, precision in specifications for these simula- 
tion models varies widely. It is thus not always easy to 
reproduce these models from the published material. 



Hence, it appears beneficial to make use of a modelling 
methodology that allows, step by step, analyzing a sys- 
tem and specifying a formalized knowledge model that 
can be distributed and reused. 

Finally, as far as we know, none of these models 
has been validated. The behavior of the studied al- 
gorithm, if it is implemented in a real system, is thus 
not guaranteed to be the same than in simulation, es- 
pecially concerning performance results. Confronting 
simulated results to measurements performed in the 
same conditions on a real system is a good method to 
hint whether a simulation model actually behaves like 
the system it models or not. 

Considering these observations, our motivation is 
to propose a discrete-event random simulation model 
that addresses the issues of genericity, reusability and 
reliability. This model, baptized VOODB ( Virtual 
Object-Oriented Database), is indeed able to take into 
account different kinds of Client-Server architectures. 
It can also be parameterized to serve various pur- 
poses, e.g., to evaluate how a system reacts to differ- 
ent workloads or to evaluate the efficiency of optimiza- 
tion methods. Eventually, VOODB has been validated 
by confronting simulation results to performance mea- 
sures achieved on real systems (namely 2 and Texas). 

The remainder of this paper is organized as fol- 
lows. Section 2 introduces our modelling approach. 
Section 3 details the VOODB simulation model. Sec- 
tion 4 presents validation experiments for this model. 
We eventually conclude this paper and provide future 
research directions in Section 5. 

2 Modelling approach 

In order to clearly identify the interest of a structured 
approach, let us imagine that a simulation program 
is directly elaborated from informal knowledge con- 
cerning the studied system (Figure [1} . Only experts 
mastering both the system to model and the target 
simulation language can satisfactorily use such an ap- 
proach. It is thus only usable for punctual studies 
on relatively simple systems. The obtained simulation 
program is not meant to be reusable or later modified, 
and its documentation is minimal at best. 

Coding 



Informal 
knowledge 
about a system 



Dedicated 
simulation 
program 



Figure 1: Unstructured approach to simulation 



In opposition, a structured approach first consists 



in translating informal knowledge into an organized 
knowledge model (Figure [2|) . This knowledge model 
rests on concepts close to those of the study domain. 
It may be more or less formalized, and must enable 
the systematic generation of a simulation program. 
This approach helps focusing on the modelled system's 
properties and to make abstractions of constraints re- 
lated to the simulation environment. It facilitates feed- 
back to improve simulation quality: it is possible to re- 
consider functioning hypothesis or detail some pieces 
by modifying the knowledge model and generating 
new code. Low-level parameters may be introduced 
(e.g., mean access time to a disk block). The work- 
load model may be directly included into the knowl- 
edge model and may itself incorporate some param- 
eters (e.g., the proportion of objects accessed within 
a given class). Since long, specialists in simulation 
worked on defining the principles of such an approach 
[Sar791 INarl8Tl ISar911 lBal92l IGou92l IKel97| . 
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Figure 2: Structured modelling approach 



The approach we recommend (Figure [3]) is a generic 
extension to the former approach. Its consists in 
broadening the study field to take into account a whole 
class of systems. The knowledge model must hence be 
tunable (e.g., high-level parameters may help selecting 
the system's architecture) and modular (some func- 
tionalities are included in specific modules that may 
be added or removed at will). The knowledge model, 
which is necessarily more complex, must be described 
in a hierarchical way up to the desired detail level. 
We used the concepts and diagrams of UML Rat97 
to describe it. 

We also propose that the workload model be sep- 
arately characterized. It is then possible to reuse 
workload models from existing benchmarks (like Hy- 
perModel [And90j . 001 [Cat91j or 007 |Car93p or 
establish a specific model. We chose to incorporate 
the workload model from the OCB {Object Cluster- 
ing Benchmark) generic benchmark [Dar98 . Thanks 
to numerous parameters, this workload model can 
be adapted to various situations (existing benchmark 
workload, specific application workload...). 

The generic simulation program is obtained in a sys- 
tematic way. Its modular architecture is the result 
of the two models it is based on. The final simula- 
tion program for a specific case study is obtained by 
instantiation of this generic program. This approach 



guarantees a good reusability. It is possible after a first 
simulation experiment to broaden the study specter by 
changing the parameters' values (especially those con- 
cerning the workload) , by selecting other modules (for 
instance, by replacing a clustering module by another) , 
or by incorporating new modules. 

3 The VOODB simulation model 

3.1 Knowledge model 

In our context, the knowledge model describes the ex- 
ecution of transactions in an OODBMS (Figure 2]). 

Transactions are generated by the Users, who sub- 
mit them to the Transaction Manager. The Trans- 
action Manager determines which objects need to be 
accessed for the current transaction, and performs the 
necessary operations on these objects. A given ob- 
ject is requested by the Transaction Manager to the 
Object Manager that finds out which disk page con- 
tains the object. Then, it requests the page from the 
Buffering Manager that checks if the page is present in 
the memory buffer. If not, it requests the page from 
the I/O Subsystem that deals with physical disk ac- 
cesses. After an operation on a given object is over, the 
Clustering Manager may update some usage statistics 
for the database. An analysis of these statistics can 
trigger a reclustering, which is then performed by the 
Clustering Manager. Such a database reorganization 
can also be demanded externally by the Users. The 
only treatments that differ when two distinct cluster- 
ing algorithms are tested are those performed by the 
Clustering Manager. Other treatments in the model 
remain the same, whether clustering is used or not, 
and whatever the clustering strategy. 

The knowledge model is hierarchical. Each of its 
activities (rounded boxes) can be further detailed, as 
is illustrated in Figure [5] for the " Access Disk" func- 
tioning rule. 

The system's physical resources that appear as 
swimlanes in the knowledge model may be qualified 
as active resources since they actually perform some 
task. However, the system also includes passive re- 
sources that do not directly perform any task, but are 
used by the active resources to perform theirs. These 
passive resources do not appear on Figure [H but must 
nevertheless be exhaustively listed (Table [l}. 

3.2 Evaluation model 
3.2.1 Simulator selection 

We first selected the QNAP2 (Queuing Network Anal- 
ysis Package 2 nd generation, version 9) discrete-event 
random simulation software Sim95J to implement 
VOODB, because it proposes the following essential 
features: 

• QNAP2 is a validated and reliable simulation tool; 
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Figure 3: Generic, structured modelling approach 



Aecess Disk 
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Processor and main memory in a centralized architecture, 
or server processor and main memory in a Client-Server 
architecture 



Clients processor and main memory in a Client-Server 
architecture 



Server disk controller and secondary storage 



Database. Its concurrent access is managed by a sched- 
uler that applies a transaction scheduling policy that de- 
pends on the multiprogramming level. 



Tabic 1: VOODB passive resources 

• QNAP2 allows the use of an object-oriented ap- 
proach (since version 6); 

• QNAP2 includes a full algorithmic language, de- 
rived from Pascal, which allows a relatively easy 
implementation of complex algorithms (object 
clustering, buffer page replacement, prefetching, 
etc.). 

However, QNAP2 is an interpreted language. The 
models written in QNAP2 are hence much slower at 
execution time than if they were written in a com- 
piled language. Therefore, we could not achieve the 
intensive simulation campaign we intended to. For in- 
stance, the simplest simulation experiments (without 
clustering) were 8 hours long, while the most complex 
were more than one week long. Thus, we could not 
gain much insight beyond basic results. 

We eventually considered the use of C++, which is 
both an object-oriented and compiled language. This 
also allowed us reusing most of the OCB benchmark's 
C++ code. But the existing C++ simulation packages 
were either not completely validated, featured much 
more than we actually needed, and hence were get- 
ting as complicated to use as general simulation lan- 
guages, or were not free. Hence, we decided to design 
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Figure 5: "Access disk" functioning rule detail 



our own CH — Y simulation kernel. It has been bap- 
tized DESP-C++ (Discrete-Event Simulation Package 
for C++). Its main characteristics are validity, sim- 
plicity and efficiency. DESP-C++ has been validated 
by comparing the results of several simulation experi- 
ments conducted with DESP-C++ and QNAP2. Sim- 
ulation experiments are now 20 to 1,000 times quicker 
with DESP-C++, depending on the model's complex- 
ity (the more a model is complex, the more QNAP2 
performs poorly). 

3.2.2 Knowledge model translation 

Once the knowledge model is designed, it can be quasi- 
automatically translated into an evaluation model us- 
ing any environment, whether it is a general simula- 
tion language or a usual programming language. Each 
entity in the knowledge model appears in the evalua- 
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Figure 4: Knowledge model 



tion model some way. In an object-oriented environ- 
ment, resources (active and passive) become instanti- 
ated classes, and functioning rules are translated into 
methods. 

More precisely, the translation from the knowledge 
model to the evaluation model proceeds as follows: 

• each active resource (swimlanes in Figure [4]) be- 
comes a component of the simulation program 
(i.e., a class); 

• each object (square boxes in Figure [4]) becomes 
an interface to these components (i.e., it is used 
as a parameter in messages between two classes); 

• each activity (round boxes in Figure 2]) becomes 
a method within a component. 

Passive resources are classes bearing mainly two 
methods: one to reserve the resource and another one 



to release it. 

Table [2] recapitulates how entities from the knowl- 
edge model are translated in QNAP2 and DESP-C++, 
which both use a resource view (where the demeanor 
of each active resource is described). Table [2] also pro- 
vides a translation in SLAM II |Pri86j . which uses a 
transaction view (where the specification concerns the 
operations undergone by the entities flowing through 
the system). This is simply to show that the im- 
plementation of VOODB with a simulator using the 
transaction view is also possible. 

3.3 Genericity in VOODB 

Genericity in VOODB is primarily achieved through 
a set of parameters that help tuning the model in a 
variety of configurations, and setting up the differ- 
ent policies influencing the eventual behavior of an in- 
stance of the generic evaluation model. VOODB also 
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Table 2: Translation of the knowledge model entities 



benefits from the genericity of the OCB benchmark 
[Dar98j at the workload level, since OCB is itself tun- 
able through a thorough set of 26 parameters. The 
parameters defining an instance of the VOODB eval- 
uation model are presented in Table El Each active 
resource is actually associated to a set of parameters. 
These parameters are normally directly deduced from 
the studied system's specifications. However, some pa- 
rameters are not always readily available and have to 
be worked out from benchmarks or measures (e.g., to 
determine network throughput or disk performances). 

Our generic model allows simulating the behavior 
of different types of OODBMSs. It is in particu- 
lar adapted to the different configurations of Client- 
Server architectures, which are nowadays the standard 
in OODBs. Our model is actually especially suitable to 
page server systems (like ObjectStore |Lam91j . or 2 
[Deu91| ). but can also be used to model object server 
systems (like ORION |Kim88j or ONTOS |And91j ), 
or database server systems, or even multiserver hybrid 
systems (like GemStone [Ser92 ) . The organization of 
the VOODB components is controlled by the "System 
class" parameter. 

4 Validation experiments 
4.1 Experiments scope 

Though we use validated tools (QNAP2, or DESP- 
CH — h), the results provided by simulation are not guar- 
anteed to be consistent with reality. To check out if 
our simulation models were indeed valid, we simulated 
the behavior of two systems that offer object persis- 
tence: 2 |Deu91| and Texas |Sin92| . We compared 
these results to those provided by benchmarking these 
real systems with OCB. The objective here was to use 
the same workload model in both sets of experiments. 

In a second step, we seeked to evaluate the impact 
of an optimization method (the DSTC clustering tech- 
nique Bul96], which has been implemented in Texas). 
We again compared results obtained by simulation and 
direct measures performed under the same conditions 
on the real system. 

Due to space constraints, we only present here our 



most significant results. Besides, our goal is not to per- 
form sound performance evaluations of 2 , Texas and 
DSTC. We just seek to show our simulation approach 
can provide trustworthy results. 

4.2 Experimental conditions 

4.2.1 Real systems 

The 2 server we used (version 5.0) is installed on 
an IBM RISC 6000 43P240 biprocessor workstation. 
Each processor is a Power PC 604e 166. The work- 
station has 1 GB ECC RAM. Its operating system is 
AIX version 4. The 2 server cache size is 16 MB by 
default. 

The version of Texas we use is a prototype (ver- 
sion 0.5) running on a PC Pentium-II 266 with 64 MB 
of SDRAM, which operating system is Linux, ver- 
sion 2.0.30. The swap partition size is 64 MB. DSTC 
is integrated in Texas as a collection of new modules, 
and a modification of several Texas modules. Texas 
and the additional DSTC modules were compiled us- 
ing the GNU C++ (version 2.7.2.1) compiler. 

4.2.2 Simulation 

Our C++ simulation models were compiled with the 
GNU C++ (version 2.7.2.1) compiler. They run on 
a PC Pentium-II 266 with 64 MB of SDRAM, under 
Windows 95. 

In order to simulate the behavior of 2 and Texas, 
VOODB has been parameterized as showed in Tabled] 
These parameters were all fixed up from the specifica- 
tion and configuration of the hardware and software 
systems we used. 

Our simulation results have been achieved with 95% 
confidence intervals (c = 0.95). To determine these 
intervals, we used the method exposed in |Ban 96j. 
For given observations, sample mean X and sam- 
ple standard deviation a are computed. The half- 
interval width h is h—t^ .i_ q/2 .ct / 'y/n, where t is given 
by the Student i-distribution, n is the number of repli- 
cations and a=l — c. The mean value belongs to 
the [X-h,X+h] confidence interval with a probability 
c = 0.95. 



Active resource 


Parameter 


Code 


Range 


Default 


System 


System class 


SYSCLASS 


{Centralized | Object Server 
| Page Server | DB Server | 
Other} 


Page Server 


Network throughput 


NETTHRU 




1 MB/s 


Buffering Manager 


Disk page size 


PGSIZE 


{512 | 1024 | 2048 | 4096 } 
bytes 


4096 bytes 


Buffer size 


BUFFSIZE 




500 pages 


Buffer page replacement 
strategy 


PGREP 


{RANDOM | FIFO | 
LFU | LRU-K | CLOCK | 
GCLOCK Other} 


LRU-1 


Prefetching policy 


PREFETCH 


{None Other} 


None 


Clustering Manager 


Object clustering policy 


CLUSTP 


{None | Other} 


None 


Objects initial placement 


INITPL 


{Sequential | Optimized se- 
quential | Other} 


Optimized Se- 
quential 


I/O Subsystem 


Disk search time 


DISKSEA 




7.4 ms 


Disk latency time 


r\Tf1T/T Am 

DISKLAT 




4.3 ms 


Disk transfer time 


DISKTRA 




0.5 ms 


Transaction Manager 


Multiprogramming level 


MULTILVL 




10 


Locks acquisition time 


GETLOCK 




0.5 ms 


Locks release time 


RELLOCK 




0.5 ms 


Users 


Number of users 


NUSERS 




1 



Table 3: VOODB parameters 



Since we wish to be within 5% of the sample mean 
with 95% confidence, we first performed a pilot study 
with n — 10. Then we computed the number of nec- 
essary additional replications n using the equation: 
n~=n.(h/h*) 2 , where h is the half-width of the confi- 
dence interval for the pilot study and h* the half- width 
of the confidence interval for all replications (the de- 
sired half- width). 

Our simulation results showed that the required 
precision was achieved for all our performance criteria 
when n+n*>100, with a broad security margin. We 
thus performed 100 replications in all our experiments. 
In order to preserve results clarity in the following fig- 
ures, we did not include the confidence intervals. They 
are however computed by default by DESP-C++. 

4.3 Experiments on 2 and Texas 

First, we investigated the effects of the object base 
size (number of classes and number of instances in the 
database) on the performances (mean number of I/Os 
necessary to perform the transactions) of the studied 
systems. In this series of experiments, the number 
of classes in the schema (NO) is 20 or 50, and the 
number of instances (NO) varies from 500 to 20,000. 
The workload configuration is showed in Table [5) The 
other OCB parameters were set up to their default 
values. 

In a second step, we varied the server cache size 
(0 2 ) or the available main memory (Texas) in order 
to study the effects on performances (mean number of 



I/Os). The objective was also to simulate the system's 
reaction when the (memory size / database size) ra- 
tio decreases. In the case of 2 , the server cache size 
is specified by environment variables. Our Texas ver- 
sion is implanted under Linux, which allows setting 
up memory size at boot time. Cache or main mem- 
ory size varied from 8 MB to 64 MB in these experi- 
ments. Database size was fixed (iVC=50, #0=20,000), 
we reused the workload from Table [5l and the other 
OCB parameters were set up to their default values. 

4.3.1 Results concerning 2 

Database size variation 

Figures [6] and [7] show how the performances of 2 vary 
in terms of number of I/Os when the number of classes 
and the number of instances in the database vary. We 
can see that simulation results arc in absolute value 
lightly different from the results measured on the real 
system, but that they clearly show the same tendency. 
The behavior of VOODB is indeed conforming to re- 
ality. 

Cache size variation 

The results obtained in this experiment in terms of 
number of I/Os are presented in Figure [8] They show 
that the performances of 2 rapidly degrade when the 
database size (about 28 MB on an average) becomes 
greater than the cache size. This decrease in perfor- 
mance is linear. Figure [8] also shows that the perfor- 
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Tabic 4: Parameters defining the 2 and the Texas systems within VOODB 



Parameter 


Val. 


Parameter 


Val. 


COLDN: Number of transactions (cold run) 





HOTN: Number of transactions (warm run) 


1000 


PSET: Set-oriented access occurrence probability 


0.25 


SETDEPTH: Set-oriented access depth 


3 


PSIMPLE: Simple traversal occurrence probability 


0.25 


SIMDEPTH: Simple traversal access depth 


3 


PHIER: Hierarchy traversal occurrence probability 


0.25 


HIEDEPTH: Hierarchy traversal access depth 


5 


PSTOCH: Stochastic traversal occurrence probability 


0.25 


STODEPTH: Stochastic traversal access depth 


50 



Table 5: OCB workload definition 



mances of 2 can be reproduced again with our simu- 
lation model. 

4.3.2 Results concerning Texas 

Database size variation 

Figures P and ITD1 show how the performances of Texas 
vary in terms of number of I/Os when the number 
of classes and the number of instances in the database 
vary. As is the case with 2 , we can see that simulation 
results and results measured on the real system lightly 
differ in absolute value, but that they bear the same 
tendency. 

Memory size variation 

Since Texas uses the virtual memory mechanisms from 
the operating system, we studied the effects of a de- 
crease in available main memory size under Linux. The 
results obtained in terms of number of I/Os are pre- 
sented in Figure QT] They show that the performances 
of Texas rapidly degrade when the main memory size 
becomes smaller than the database size (about 21 MB 
on an average). This degradation is due to Texas' ob- 
ject loading policy, which provokes the reservation in 
memory of numerous pages even before they are ac- 



tually loaded. This process is clearly exponential and 
generates a costly swap, which is as important a hin- 
drance as the main memory is small. The simulation 
results provided by VOODB are still conforming to 
reality. 

4.4 Effects of DSTC on the performances of 
Texas 

We underlined DSTC's clustering capability by placing 
the algorithm in favorable conditions. For this sake, we 
ran very characteristic transactions (namely, depth-3 
hierarchy traversals) and measured the performances 
of Texas before and after clustering. We also evaluated 
clustering overhead. We checked out that the behavior 
of DSTC was the same in our simulation model and 
in the real system, by counting the number of created 
clusters and these clusters' mean size. 

This experiment has been performed on a mid-sized 
database (50 classes, 20,000 instances, about 20 MB 
on an average). We had also planned to perform this 
experiment on a large object base, but we encoun- 
tered technical problems with Texas/DSTC. To bypass 
the problems, we reduced the main memory size from 
64 MB to 8 MB so that the database size is actually 
large compared to the main memory size. Then, we 
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Figure 6: Mean number of I/Os depending on number of 
instances (0 2 - 20 classes) 
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Figure 7: Mean number of I/Os depending on number of 
instances (0 2 - 50 classes) 



Figure 9: Mean number of I/Os depending on number of 
instances (Texas - 20 classes) 



reused the mid-sized object base from the first series 
of experiments. The other OCB parameters were set 
up to their default values. 

Table [S] presents the numbers of I/Os achieved on 
the real system and in simulation, for the mid-sized 
database. It shows that DSTC allows substantial per- 
formance improvements (performance gain around a 
factor 5). Clustering overhead is high, though. Fur- 
thermore, the simulation results are overall consistent 
with the performance measurements done on the real 
system, except concerning clustering overhead, which 
is far less important in simulation than in reality. 

This flagrant inconsistency is not due to a bug in 
the simulation model, but to a particularity in Texas. 
Indeed, after reorganization of the database by DSTC, 
objects are moved on different disk pages. Hence, their 
OIDs change because Texas uses physical OIDs. In or- 
der to maintain consistency among inter-object refer- 



ences, the whole database must be scanned and all ref- 
erences toward moved objects must be updated. This 
phase, which is very costly both in terms of I/Os and 
time, is pointless in our simulation models, since they 
necessarily use logical OIDs. 

To simulate DSTC's behavior within Texas in a 
wholly faithful way, it would have been easy to take 
this conversion time into account in our simulations. 
However, we preferred keeping our initial results in or- 
der to underline the difficulty to implant a dynamic 
clustering technique within a persistent object store 
using physical OIDs. On the other hand, our simula- 
tions show that such a dynamic technique is perfectly 
viable in a system with logical OIDs. 

The number of clusters built by the DSTC method 
and these clusters' average size are presented in Ta- 
ble [7j We can observe again that there are few dif- 
ferences between the real system's behavior and its 
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Figure 10: Mean number of I/Os depending on number of 
instances (Texas - 50 classes) 





Bench. 


Sim. 


Ratio 


Pre-clustering usage 


1890.70 


1878.80 


1.0063 


Clustering overhead 


12799.60 


354.50 


36.1060 


Post-clustering usage 


330.60 


350.50 


0.9432 


Gain 


5.71 


5.36 


1.0652 



Table 6: Effects of DSTC on the performances (mean num- 
ber of I/Os) - Mid-sized base 





Bench. 


Sim. 


Ratio 


Mean number of clusters 


82.23 


84.01 


0.9788 


Mean number of obj./clust. 


12.83 


13.73 


0.9344 



Table 7: DSTC clustering 
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Figure 11: Mean number of I/Os depending on memory 
size (Texas) 



simulated behavior with VOODB. 

Eventually, Table [8] presents the number of I/Os 
achieved on the real system and by simulation, for the 
"large" database. It shows that simulation results are 
still consistent with performances observed on the real 
system. Furthermore, the gain induced by clustering 
is much higher when the database does not wholly fit 
into the main memory (increase from a factor 5 to a 
factor of about 30). This result was foreseeable, since 
the more the memory size is reduced, the more the sys- 
tem must perform page replacements. Unused pages 
hence normally remain only a short time in memory. 
A good object clustering is thus more useful in these 
conditions. Clustering overhead is not repeated here, 
since we reused the object base (in its initial and clus- 
tered state) from the first series of experiments. 



5 Conclusion 

We present in this paper a generic discrete-event ran- 
dom simulation model, VOODB, which is designed to 
evaluate the performances of OODBs. VOODB is pa- 
rameterized and modular, and thus can be adapted to 
various purposes. It allows the simulation of various 
types of OODBMSs and can capture performance im- 
provements achieved by optimization methods. Such 
optimization methods can be included in VOODB as 
interchangeable modules. Furthermore, the workload 
model adopted in VOODB (the OCB benchmark) can 
also be replaced by another existing benchmark or a 
specific workload. VOODB may be used as is (its C++ 
code is freely available) or tailored to fit some partic- 
ular needs. 

We have illustrated the genericity of VOODB and 
hinted its validity by setting its parameters to sim- 
ulate the behavior of the 2 OODB and the Texas 
persistent object store. We correlated the simulated 
performances of both systems with actual performance 
measures of the real systems (performed with the OCB 
benchmark), and observed they matched. The effects 
of the DSTC clustering technique on Texas' perfor- 
mances have also been mimicked by simulation. 

VOODB may be used for several purposes. The 
performances of a single, or several optimization al- 
gorithms, may be evaluated in many different condi- 
tions. For instance, the host OODB or OS can vary, 
to see how a given algorithm behaves. Such cluster- 
ing strategies may also be compared to each other that 
way. Furthermore, simulation has a low cost, since the 
different simulated systems (hardware, OS, OODBs) 
do not need to be acquired. Their specifications are 
enough. Eventually, we can a priori model the behav- 
ior of new systems, test their performances, analyze 
the simulation results, and ameliorate them (and then 
reiterate the process). 

Eventually, VOODB has been obtained through the 





Bench. 


Sim. 


Ratio 


Pre-clustering usage 


12504.60 


12547.80 


0.9965 


Post-clustering usage 


424.30 


441.50 


0.9610 


Gain 


29.47 


28.42 


1.0369 



Tabic 8: Effects of DSTC on the performances (mean num- 
ber of I/Os) - "Large" base 



application of a modelling methodology that led to the 
design of a generic knowledge model and a generic eval- 
uation model. This approach ensured that the specifi- 
cations of the simulation models were precise enough 
for our deeds and that the evaluation model was prop- 
erly translated from the knowledge model. It is also 
possible to reuse our knowledge model to produce sim- 
ulation programs in other simulation languages or en- 
vironment than QNAP2 or DESP-C++. 

The reusability of VOODB may be important in a 
context of limited publicity. Since benchmarkers can 
encounter serious legal problems with OODB vendors 
if they publish performance studies |Car93j . it can be 
helpful to have a tool to perform private performance 
evaluations. 

Future work concerning this study is first perform- 
ing intensive simulation experiments with DSTC. We 
indeed only have basic results. It would be interesting 
to know the right value for DSTC's parameters in var- 
ious conditions. We also plan to evaluate the perfor- 
mances of other optimization techniques, like the clus- 
tering strategy proposed by |Gay97| , which has also 
been implemented in Texas, recently. This clustering 
technique originates from collaboration between the 
University of Oklahoma and Blaise Pascal University. 
The ultimate goal is to compare different clustering 
strategies, to determine which one performs best in a 
given set of conditions. 

Though simulation may be used in substitution 
to benchmarking (mainly for a priori performance 
evaluations), it may also be used in complement to 
benchmarking. For instance, mixed benchmarking- 
simulation approach may be used to measure some 
performance criteria necessitating precision by exper- 
imentation, and other criteria by simulation (e.g., to 
determine the best architecture for a given purpose). 
With such an approach, using the same workload (e.g., 
OCB) in simulation and on the real system is essential. 

The VOODB simulation model could also be im- 
proved, in order to include more components influenc- 
ing the performances of OODBs. For instance, it cur- 
rently only provides a few basic buffering strategies 
(RANDOM, FIFO, LFU, LRU-K, CLOCK...) and no 
prefetching strategy, which have been demonstrated 
to influence the performances of OODBs a lot, too 
|Bul96j . 

VOODB could even be extended to take into ac- 
count completely different aspects of performance in 



OODBs, like concurrency control or query optimiza- 
tion. VOODB could also take into account random 
hazards, like benign or serious system failures, in or- 
der to observe how the studied OODB behaves and 
recovers in critical conditions. Such features could be 
included in VOODB as new modules. 

Eventually, to make reusability easier and more for- 
mal, VOODB could be rebuilt as part of a reusable 
model library, as modular fragments that could be as- 
sembled to form bigger models. For this sake, slicing 
the model into fragments is not enough. The struc- 
ture and interface of each module must also be stan- 
dardized and an explicit documentation for every sub- 
model must be provided |Bre98j . 
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